A function for quality control. It may be used to count/remove neighbor repeated SNPs and markers with MAF lower than a given threshold. This function is also used for imputations.
Numeric matrix containing the genotypic data. A matrix with $n$
rows of observations and ($m$) columns of molecular markers. SNPs must be coded as 0, 1, 2, for founder homozigous, heterozigous and reference homozigous. NA is allo
psy
Tolerance parameter for repeated markers. Default is 1, which removes only SNPs 100% equal to its following neighbor.
MAF
Minor Allele Frequency. Default is 0.05. Useful to inform or remove markers below the MAF threshold.
remove
Remove SNPs that are redundant or pursue low MAF: TRUE/FALSE.
impute
If TRUE, impute missing values using Random Forest implemented in the package missForest. Methods are described in Rutkoski et al (2013).
Value
Returns the genomic matrix without missing, redundant or low MAF markers.
References
Rutkoski, J. E., Poland, J., Jannink, J. L., & Sorrells, M. E. (2013). Imputation of unordered markers and the impact on genomic selection accuracy. G3: Genes| Genomes| Genetics, 3(3), 427-439.